Skip to content

docs: EKS cluster governance at scale — status, roadmap, and AI agents #2

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

arnol377
Copy link
Collaborator

Summary

Post-demo design document capturing current state, gaps, and the full roadmap for governing and scaling the EKS cluster fleet.

What's in this doc

  • Part 1 — Current state forensics: what the provisioning pipeline does, how clusters/<name>/main.tf serves both as cluster metadata and injection location map, the _apps-adsd-eks account repo pattern
  • Part 2 — Target state (4 bullet points)
  • Part 3 — Simplest possible plan:
    • Step 0: decouple repo_name from cluster_name
    • Step 0a (new): restructure clusters/ into clusters/<lifecycle>/<team>/<name>/ hierarchy; thread team/lifecycle params through SC → Lambda → CodeBuild
    • Step 1: update_all_clusters.py with --lifecycle, --team, --force flags
    • Steps 2–5: governance, README, fleet workspace, injection decision
  • Part 4 — Four AI agents + four Copilot skills for cognitive scale
  • All file/path references updated to GHE URLs

Follows the May 14 GitHub Provisioning Pipeline Demo (Manuel, Matthew, David, Delong).

- Document current provisioning pipeline, per-cluster update mechanism,
  and account repo (_apps-adsd-eks) injection pattern
- Clarify dual role of clusters/<name>/main.tf: cluster metadata AND
  injection location map; show repo_name vs cluster_name decoupling
- Add Step 0a: restructure clusters/ into lifecycle/team hierarchy
  (clusters/dev/adsd/, clusters/prod/ois/, etc.) with SC param threading
- Update_all_clusters.py --lifecycle/--team/--force flags
- GitHub Action for eks-fleet.code-workspace auto-maintenance
- Propose four AI agents + four copilot skills for cognitive scale
- Replace all local path references with GHE URLs
Dave Arnold added 2 commits May 15, 2026 13:16
- Add gap: clusters/ embedded inside module repo conflates module
  versioning with fleet operations
- New Step 0b: extract clusters/ to SCT-Engineering/terraform-eks-fleet;
  module source becomes versioned GHE ref instead of ../../
- Step 0a (hierarchy restructure) now happens inside terraform-eks-fleet
- Update task list: items 1-6 this sprint; 7-15 follow
- Update Part 2 target state to reference terraform-eks-fleet
- Update 'After Items' section heading
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants